______________________________________________
README: NCHLT text resource development annotated data
______________________________________________
1. About NCHLT text resource development annotated data
2. License
______________________________________________

1. NCHLT text resource development annotated data: Named entity

This directory contains named entity annotated data for the NCHLT Text Resource Development: Phase II Project.
The named entity annotated data is a combination of the 50,000 tokens annotated during the NCHLT text resource development project, as well as additional data from the NCHLT text corpora previously collected. This annotated set consists of a minimum of 15,000 tokens annotated as one of the five phrase types described in the protocol. The phrase types are:
	ORG - Organisation
	LOC - Location
	PER - Person
	MISC - Miscellaneous
	OUT - not considered part of any named entity.

Please see the protocol for more details.

The folder contains data both in the original annotated lara3 format, as well as a tab-delimited text version of the same file.
The text documents are tab delimited where each line consists of a:
token	annotation
Empty annotations indicate that the token is not part of a named entity, i.e. Outside

Each sentence in the data is preceded by an empty line.


The annotated data sets were developed by the Centre for Text Technology (CTexT, North-West University, South Africa).

See: http://www.nwu.ac.za/ctext for more information.

______________________________________________

2. License

These files are distributed under the Creative Commons Attribution 2.5 South Africa license. 

_______________________________________________
License: Creative Commons Attribution 2.5 South Africa
URL: http://creativecommons.org/licenses/by/2.5/za/

Attribute work to: South African Department of Arts and Culture & Centre for Text Technology (CTexT, North-West University, South Africa)

Attribute work to URL: http://www.nwu.ac.za/ctext 
____________________________________________________________________________________________